Resolving Neyman's Paradox
نویسنده
چکیده
According to Fisher, a hypothesis specifying a density function for X is falsi®ed (at the level of signi®cance a) if the realization of X is in the size-a region of lowest densities. However, non-linear transformations of X can map low-density into high-density regions. Apparently, then, falsi®cations can always be turned into corroborations (and vice versa) by looking at suitable transformations of X (Neyman's Paradox). The present paper shows that, contrary to the view taken in the literature, this provides no argument against a theory of statistical falsi®cation. 1 The problems of statistical falsi®cation 2 Redhead's version of Neyman's Paradox 3 A counterargument 4 Dierent measurement processes 1 The problems of statistical falsi®cation From a theoretical perspective, statistical inference is a serious problem for falsi®cationism. Actual tests of scienti®c hypotheses in most cases involve statistical arguments. Falsi®cationism is therefore plausible only if it can deal with statistical hypotheses. However, there is no falsi®cationist theory of statistical inference that is accepted even by falsi®cationists. Except for the trivial case where a zero-probability event is observed, a falsi®cation of a statistical hypothesis presupposes the choice of a rejection region, that is, an event that is possible under the hypothesis in question but the observation of which is nevertheless taken as a falsi®cation. Following Popper ([1984]) and Gillies ([1971], [1973]), the rejection region is to be Brit. J. Phil. Sci. 53 (2002), 69±76 &British Society for the Philosophy of Science 2002 1 Forster ([1995], p. 402) explains the attraction of Bayesianism in philosophy of science by the fact that the logic-based accounts of scienti®c inference (which include falsi®cationism) have nothing to say about statistical problems. Nevertheless, statistical practice is often quite falsi®cationist in spirit; cf. Gillies ([forthcoming]) on Neyman's use of the w test. This causes problems to non-falsi®cationists, who have to argue that standard and successful practices are in fact illegitimate. 2 `Rejection' and `falsi®cation' are used as synonyms throughout, although this is not always in accordance with the use of the term `rejection' in the statistical literature. chosen according to a methodological rule, a falsifying rule for statistical inference (FRSI). According to Neyman and Pearson's ([1933]) theory of statistical inference (NPT), such a rule should take two kinds of errors into account, usually called error of the ®rst and error of the second kind. A ®rst-kind error is an erroneous falsi®cation of a true hypothesis, while a second-kind error is the failure to falsify a false hypothesis (erroneous corroboration). The risk of a ®rst-kind error can be controlled by choosing a rejection region that has a suciently small probability a, also called level of signi®cance, under the hypothesis. The risk of a second-kind error has to be controlled by considering at least one explicit alternative to the hypothesis under test. The optimal test in the case of only two alternatives minimizes the probability b of not falsifying the original hypothesis if the alternative is true. Equivalently, it maximizes the power 1ÿ b. The NPT is acceptable to falsi®cationists if and only if the complete set of alternative hypotheses considered is part of the background knowledge, i.e. can be taken for granted in the context of the inquiry. If, for example, it is known that a certain procedure guarantees random sampling from a ®nite population (in the sense of every element of the population having the same probability to be included in the sample), then there is a ®xed set of hypotheses concerning the population average of any variable, and these hypotheses give rise to a ®xed set of statistical hypotheses. The randomsampling hypothesis is part of the background knowledge, and the set of hypotheses in which we can use an NP test is determined by the logical consequences of this basic hypothesis together with certain non-statistical hypotheses about the population. In such cases, a falsi®cationist can agree with Neyman ([1965], p. 448) that there is no dierence between estimation (choosing the in some sense best hypothesis from a given set) and testing. When, however, the question arises (as it must at some stage) as to whether the basic hypothesis in the background is true, the situation is dierent. One may be able to ®nd some test based on a still more general basic hypothesis, but if the hypothesis under scrutiny becomes more and more general, the NPT quickly runs out of tests. In practice, therefore, some assumptions are tested with the help of tests that are quite general but questionable from the standpoint of the NPT, like the w test. Thus, a falsi®cationist interpretation of the NPT must be based on some other, more fundamental approach answering the question of how to assess, 70 Max Albert 3 For these cases, the interpretation and extension of the NPT by Mayo ([1996]) ®ts into a falsi®cationist framework. See also Mayo and Spanos ([2000]) for further progress in rendering the idea of the severity of a test more precise. Note that the falsi®cationist idea of severity of tests relies on background knowledge that can be (or already has been) tested independently; cf. Musgrave's ([1974]) discussion of Hempel's raven paradox. as econometricians are wont to say, the assumptions of the statistical model. At this level, at least, tests should be independent from alternative hypotheses (Gillies [1971], [1973], [forthcoming]; Albert [1992]). The obvious candidate for a theory of statistical falsi®cation, Gillies' ([1971], [1973]) theory, is open to three important objections that have already been raised against Fisher's theory of signi®cance testing (cf. Fisher [1990], Spielman [1974]). Consider a statistical hypothesis stating a distribution function for a one-dimensional random variable (rv). Let us assume that we have an FRSI specifying, without recourse to alternative hypotheses, a rejection region for a single observation of the rv. Then the following three problems must be solved. 1. Selection of a test statistic. It is not obvious how to extend the FRSI to the case of n > 1 observations. The solution is to select a one-dimensional test statistic; however, there are many candidates. 2. Optional stopping. The rule determining the number of observations, the stopping rule, has a potential in ̄uence on the level of signi®cance if the decision to stop is not independent from the observations (optional stopping). For any experiment, there are always several interpretations yielding dierent signi®cance levels: the conventional interpretation that assumes stopping to be independent from observations, and many alternative interpretations specifying dierent ways of optional stopping. If one concludes that optional stopping makes a dierence, the experimenter's intentions matter for the evaluation of otherwise indistinguishable observations. 3. Neyman's Paradox. Assume that the rv is continuously distributed with a density. Fisher's and Gillies' theories select the lowest-density regions as rejection regions. Non-linear transformations of the rv can map low-density into high-density regions and vice versa, leading to dierent decisions on the basis of the same FRSI and the same observations. The NPT solves the ®rst and the third problem by appealing to alternative hypotheses. As already argued, this is not always an option for falsi®cationists. The second problem also arises for the NPT. However, these problems pose no insurmountable obstacles for a theory of statistical falsi®cation. Just adopting some test by convention (e.g. the w Resolving Neyman's Paradox 71 4 The problem of selecting a test statistic originally motivated Neyman and Pearson ([1930], [1933]) to modify Fisher's theory; see also Hacking ([1965], pp. 75±81). On optional stopping, see Hacking ([1965], pp. 107±9) and Berger and Berry ([1988]). Neyman's Paradox, originally formulated in connection with the t test, goes back to a 1929 paper of Neyman; cf. Neyman and Pearson ([1930], p. 101n) and Neyman ([1952], pp. 45±51). The general formulation is due to Redhead ([1974]). goodness-of-®t test for discrete hypotheses) goes far in solving the ®rst problem. Mayo ([1996]) argues that one actually should worry about optional stopping, thus attacking the position of adherents of the likelihood principle (Bayesians and others) that the insensitivity of likelihood-based inferences to optional stopping is a point in their favor. The present note argues that Neyman's Paradox does not arise if one accepts a reasonable condition concerning the scope of an FRSI. 2 Redhead's version of Neyman's Paradox The very general nature of Neyman's Paradox has been argued most forcefully by Redhead ([1974]) in his critique of Gillies' rule. Consider the hypothesis X N, meaning that the rv X is distributed according to the cumulative distribution function N of the standard normal distribution, and restrict the analysis to the case of a single observation. The so-called Gauss test at a level of signi®cance 0.05 then requires rejection if x j j51:96. This test is accepted by Fisher and Gillies, at least in principle. Consider the rv Y t X, where t X def N ÿ1 3 2 ÿN X , X4 0 Nÿ1 1 2 ÿN X , X5 0 and where therefore Y N. The same Gauss test can be applied to Y as well. If X is in the rejection region, Y is not, and vice versa. Thus one can always reformulate the hypothesis in such a way that it is not rejected by mathematically the same test. Redhead's argument is completely general. Transformations that map the tails of the original distribution to the centre of the new distribution can always be found. Moreover, Redhead's transformation guarantees for any unimodal distribution with mode 0 that X and t X are distributed identically.
منابع مشابه
Resolving and Exploiting the $k$-CFA Paradox
semantics: k-CFA ((f e), β̂, σ̂, â) ; (e′, β̂′, σ̂′, â′), where Ê(f, β̂, σ̂) ∋ 〈(λ (x) e′), β̂′′〉 â′ = ⌊l · â⌋k Ê(e, β̂, σ̂) = d̂ β̂′ = β̂′′[x 7→ â′] σ̂′ = σ̂ ⊔ [â′ 7→ d̂] And: Ê(x, β̂, σ̂) = σ̂(β̂(x)) Ê((λ (x) e), β̂, σ̂) = {〈(λ (x) e), β̂〉} Resolving and Exploiting the k-CFA Paradox, New England Programming Language and Systems Sysmposium, MIT, December, 2009 – p.5/22 The Paradox of k-CFA Resolving and Exploiting ...
متن کاملBertrand’s Paradox and the Principle of Indifference*
The principle of indifference is supposed to suffice for the rational assignation of probabilities to possibilities. Bertrand advances a probability problem, now known as his paradox, to which the principle is supposed to apply; yet, just because the problem is ill-posed in a technical sense, applying it leads to a contradiction. Examining an ambiguity in the notion of an ill-posed problem show...
متن کاملResolving the clinico - radiological paradox in multiple sclerosis
Understanding the clinico-radiological paradox is important in the search for more sensitive and specific surrogates of relapses and disability progression (such that they can be used to inform treatment choices in individual people with multiple sclerosis) and to gain a better understanding of the pathophysiological basis of disability in multiple sclerosis (to identify and assess key therapeu...
متن کاملFair-balance paradox, star-tree paradox, and Bayesian phylogenetics.
The star-tree paradox refers to the conjecture that the posterior probabilities for the three unrooted trees for four species (or the three rooted trees for three species if the molecular clock is assumed) do not approach 1/3 when the data are generated using the star tree and when the amount of data approaches infinity. It reflects the more general phenomenon of high and presumably spurious po...
متن کامل